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D ■ Abstract 

I We consider the problem of testing whether a correlation matrix of a multivariate 

(S| ■ normal population is the identity matrix. We focus on sparse classes of alternatives 

■ where only a few entries are nonzero and, in fact, positive. We derive a general lower 
^ I bound applicable to various classes and study the performance of some near-optimal 

• tests. We pay special attention to computational feasibility and construct near-optimal 

(-^ . tests that can be computed efficiently. Finally, we apply our results to prove new lower 

\ bounds for the clique number of high-dimensional random geometric graphs. 
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1 Introduction 

in; 

*^ . In multivariate statistics, inference about a covariance (i.e., dispersion) matrix aims at an- 

^ ! swering questions of dependencies between the variables. This is strictly true when the vari- 

I ables are jointly Gaussian, which is the classical assumption. A basic question is whether 

^ ■ the variables are dependent at all. Concretely, consider a simple setting where the com- 

■ ponents of a random vector are jointly normal, each with zero mean and unit variance. 
^ . Then the variables are independent if and only if their covariance matrix is the identity 

^ ! matrix. As usual, inference is based on an i.i.d. sample of size m, denoted Xi, . . . , with 

■ - - ■ Xf = {Xf^i, . . . , Xf^n) G M" for t = 1, . . . , m. As stated above, we assume that lEX^^j = and 

Var(Xt^j) = 1, and let aij = Cov^Xt^i, Xtj). 

We are interested in testing whether the population covariance matrix is the identity 
matrix, or not, so the null hypothesis is 

Ho : aij = 0, ^ j ■ 

This testing problem is well studied in the classical regime where the dimension n is fixed 
and the sample size m increases to infinity, see Muirhead (1982, Sec. 8.4). Here, we study 
the regime where the dimension is large, that is, n — oo, and focus on alternatives where 
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the covariance matrix is sparse in the usual sense, meaning that even under the alternative, 
only a few variables are substantially correlated. 

This sparse setting has been investigated in the last few years, with recent work on 
the estimation of sparse covariance matrices Bickel and Levina (2008a,b); Cai et al. (2010); 
El Karoui (2008) and inference on sparse graphical models, for example Lam and Fan (2009); 
Meinshausen and Biihlmann (2006); Rajaratnam et al. (2008); Verzelen and Villers (2010); 
Yuan and Lin (2007). While the literature has focused almost exclusively on estimating the 
dependency structure, we focus here on the more basic task of testing whether there is any 
dependency at all. In this line of work, we find Chen et al. (2010); Lcdoit and Wolf (2002), 
where modified versions of the likelihood ratio test are proposed to handle the case where 
the dimension n increases with the sample size. This is the setting we consider, specializing 
to the case of sparse correlation structures under the alternative. 

1.1 Correlation models 

We introduce sparse models of correlation matrices to test against. Though many more 
models are possible, we choose a few emblematic examples that are of interest in a much 
wider sense within the literature on sparse covariance estimation and on sparse graphical 
models. In all cases, the null hypothesis is that the observed vector has identity covariance 
matrix. For the alternative hypothesis, we consider the following prototypical examples: 

• Block model. The covariance under the alternative is the identity matrix except for 
a k X k block on the diagonal. Formally, given p > 0, we assume here that there is a 
subset of indices of the form S = {i, . . . ,i + k — l} (called /c-interval) such that cxjj- > p 
ii i, i E S,i ^ i . The set S is called the anomalous set. 

• Clique model. This model is defined as the block model with the possible anomalous 
set S ranging over all the subsets of indices of size k (called k-set). 

• Perfect matching model. Suppose n is a perfect square with n = k"^. Here the 
components of the observed vector X correspond to edges of the complete bipartite 
graph on 2k vertices. The alternative hypothesis is that the bipartite graph has a 
perfect matching such that Cij > p for all i, j ^ S,i ^ j where S is the anomalous set 
of indices corresponding to the edges of the perfect matching. 

The block model is closely related to the models used in Cai et al. (2010) to obtain bounds 
on the minimax risk of estimating sparse matrices. Roughly speaking, Cai et al. (2010) use 
the block model with S = {l,...,k} and place nonzero entries in a (carefully designed) 
fashion within that block. The fraction of nonzero entries within the block is about one-half. 
We could also assume that only a fraction of the entries in the block are nonzero and it 
would only change constants later on. More importantly, to make the detection problem 
interesting, we need to consider all possible blocks. Note that the block model is parametric. 
The clique model is a natural generalization of the block model leading to a nonparametric 
model. The perfect matching model gives an example of a class of sets with a more intricate 
combinatorial structure which our approach is able to deal with. 
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1.2 Tests and their risks 



As usual, a test is a binary-valued function / : M"'™ — )■ {0,1}, with f{Xi,...,Xm) = 1 
meaning that the test rejects the null Hq in favor of the particular alternative of interest. We 
measure the performance of a test based on its worst-case risk over the model of interest Ai, 
formally defined by 

^max^^) = Po{/(Xi, ...,XJ = 1}+ sup Pa/{/(Xi, . . . , XJ = 0} . 

(Po denotes the distribution under the null while Pm denote the distribution under the 
alternative associated with a particular covariance structure M.) In our setup, 
depends on n, m, p, and the class C of possible index sets S. When all non-zero covariances 
(Tij are actually equal to the lower bound p, then Fm is determined by S and, with a slight 
abuse of notation, we write P5 for Fm- Clearly, 

^max(^) > Po{/(Xi, . . . , X„) = 1} + maxP5{/(Xi, . . . , X^) = 0} 

and indeed all lower bounds derived in this paper start with this inequality. We will derive 
upper and lower bounds for the minimax risk, 

:= infi^'^^'l/) , 

where the infimum is taken over all measurable functions / : M"™" — ?■ {0, 1}. 

The lower bounds will be obtained by putting a prior on model C and obtaining a lower 
bound on the corresponding Bayesian risk which never exceeds the worst-case risk. In all 
cases, we draw the set S uniformly at random within the class C. The upper bounds are 
obtained by studying the performance of specific tests. 

We focus on the case where the dimension n and the sample size m are both large. 
Of course, such asymptotic statements only make sense if we define sequences of integers 
m = TUn, k = kn, p = Pn and classes C = C„. This dependency in n will be left implicit. 
In this asymptotic setting, we say that reliable detection is possible (resp. impossible) if 
^max _^ g (^].ggp_ _^ g^g _i. qq_ ^Iso we Say that a sequence of tests (/„) is asymptotically 
powerful (resp. powerless) if -R™'"(/„) — (resp. — )■ 1). 

1.3 A preview of results for the cUque model 

Among the models we consider, the clique model is perhaps the most compelling because 
of its relevance in applications and its complexity. Also, for a given value of k, the clique 
model is the richest possible and therefore for any given p, n, m, i?™^^ is larger than for any 
other model. This makes the clique model an important benchmark. 

Here we summarize our main findings for this special class. We discover various types 
of behavior in distinct ranges of the parameters m, k, p. Roughly speaking, and ignoring 
logarithmic factors, we arrive at the following conclusions. Two tests are competing for 
near-optimality. The first one is a 'global' test akin to the classical test (Muirhead, 1982, 
Sec. 8.4) and the refinements in Chen et al. (2010); Ledoit and Wolf (2002). The second is 
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a 'local' test reminiscent to the generalized likelihood ratio test. The latter dominates the 
former when 

max(l,(A;/m)i/2) k^/^ 
n 

is small, meaning when the alternative is more sparse (corresponding to smaller values of k). 

Our results also point out to an interesting, perhaps even surprising, phenomenon. It 
turns out that if the sample size m is o(logn), the risk of the optimal test is not significantly 
smaller than when m = 1, that is, when only one observation is available. In other words, 
if the dimension n grows faster than exponential in the sample size m, the situation is 
essentially the same as if the sample size were equal to one. However, the situation becomes 
drastically different when m is much larger than log n as reliable detection becomes possible 
for significantly smaller values of k. 

To be more precise, consider the case when p is a constant, independent of n. The lower 
bound of Theorem 1 implies that it is impossible to have R^^"^ unless k"^ = o(e''"^/n). 
Thus, when the sample size m is so small that m = o(logn), then a vanishing risk is 
impossible to achieve unless k > n^^'^^°^^\ On the other hand, if A; ^ n^/^, then already for 
771 = 1, one has R^^^ — ?■ 0, see Arias-Castro et al. (2012). This shows the surprising fact that 
the difference between a sample of size m = 1 and m = o(logn) is negligible and repeated 
observations do not help much. 

However, when the sample size becomes logarithmic in n, the situation changes dramat- 
ically. In fact, when m = f2((l/p) logn), one has R^^^ — for all values of k > 2, and such 
a vanishing risk is achieved by the localized squared-sum test described in Section 3.2. 

This example reveals an interesting "phase transition" that occurs when the sample size 
becomes logarithmic in n. 

1.3.1 Computational considerations 

The "local" test that achieves near-optimal behavior in a large range of the parameters is a 
scan statistic that requires the computation of a maximum over all (^) subsets of components 
of size k. In its naive implementation, this test is computationally intractable, unless k is 
very small. We also believe that computing this test is a fundamentally hard computational 
problem. We do not have a rigorous argument to prove such a hardness result but it is worth 
pointing out that the problem is quite similar, in spirit, to the notoriously difficult hidden 
clique problem, see Alon et al. (1999). 

What performance can we achieve with limited computational power? Such questions 
of trade-off between statistical performance and computational complexity are at the heart 
of high- dimensional statistics and machine learning. We probe this question and describe a 
family of tests that balances detection performance and computational complexity. 

In particular, in Section 4.4 we design a test that achieves near-optimal performance 
(similar to that of the scan statistic) and may be computed in polynomial-time in n when 
m = O(logn), p is a constant, and k ^ n"" for some a G (0, 1/2). 
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1.3.2 An application in the study of random geometric graphs 

In Section 7 we apply the lower bound for the optimal risk in the clique model in a perhaps 
unexpected context and derive a new lower bound for the clique number of a high-dimensional 
random geometric graph. The setup is as follows. 

Consider a random geometric graph on the unit sphere in dimension m. The graph 
has n vertices, each corresponding to a random point on the unit sphere. Two vertices are 
connected by an edge if the inner product of the corresponding points is positive. In a 
recent paper, Devroye et al. (2011) studied the clique number (i.e., the size of the largest 
clique in the graph) u{n,m) of such a graph in various regimes. They showed that when 
m ~ clogn for a sufficiently small constant c, u{n,m) = ?t,^~°'^^^ with high probability, while 
when m > 91og^n, u!{n,m) = O(log'^n). However, nothing was known about the behavior 
of the clique number in between. In particular, it was unclear where exactly the clique 
number becomes polylogarithmic. In Section 7 we show that the phase transition occurs at 
m X log^ ra. In particular, we prove that for all c > 0, when m ~ clogn, then the median 
of u{n, m) grows as a positive power of n and even for m x log^~^ n, the median of uj{n, m) 
grows faster than any power of log?7,, for all e > 0. 

1.4 Related work 

As mentioned before, the literature on sparse covariance estimation and on graphical model 
estimation has become quite extensive. In spite of this surge of interest in sparse high- 
dimensional models, not much has been done in terms of detection of correlations. We note 
the work of Verzelen and Villers (2010), who consider the task of testing a given dependency 
structure. Our objective here is admittedly more modest and a more closely related is our 
own paper Arias-Castro et al. (2012), which focuses entirely on the case where the sample 
size is equal to one (i.e., m = 1). Our results here are seen to extend those in the one-sample 
case, with the regimes now partitioned according to the sample size. While the case where 
p — )■ 1 showed to be of interest in the case where m = 1, we are concerned here with the 
situation where p G (0, 1) is either fixed or tends to zero. 

Note that our work is different from Butucca and Ingster (2011) where the task is the 
detection of a submatrix with higher per-coordinate mean in a large matrix with i.i.d. Gaus- 
sian entries, which is more closely related to the literature on the detection of sparse nonzero 
entries in the mean of a random vector. Our work has parallels with that literature which, 
for the clique model, focuses on the "detection-of-means" problem (see Addario-Berry et al. 
(2010); Arias-Castro et al. (2008); Baraud (2002); Donoho and Jin (2004); HaU and Jin (2010); 
Ingster (1999); Jin (2003)) defined as follows : Under the null, the vectors are i.i.d. stan- 
dard normal, while under the alternative, there is a subset 5* C {1, . . . ,n} in some class C 
of interest such that, Xt are i.i.d. normal with mean (/ii, . . . , fin)'^ and identity covariance, 
where fii > fi for i ^ S and /ij = for i ^ S. Thus /i > is the minimum (per-coordinate) 
signal amplitude. Of course, one immediately reduces by sufficiency to the case m = 1 by 
averaging over the sample. This explains why the literature focuses on the case m = 1. The 
connection between the detection-of-means problem with the correlation detection problem 
studied here was detailed (for m = 1) in our previous paper (Arias-Castro et al., 2012), 
where p was found to correspond to fi"^. The connection is based on the following simple 
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representation result for equi-correlated normal random variables. 

Lemma 1 (Berman (1962)) Let Xi, . . . , X^. be standard normal random variables with 
Cov{Xi, Xj) = p > for i ^ j. Then there are independent standard normal random 
variables V,Yi, . . . ,Yk such that Xi = ^/pV + ^yl — pYi for all i. 

Thus, given V, the problem becomes that of detecting a subset of variables — here implic- 
itly assumed to be indexed by S* = {1, . . . , k} — with nonzero mean (equal to y/pV) and with 
a variance equal to 1 — p (instead of 1). This representation was used in Arias-Castro et al. 
(2012) to obtain a general lower bound that seemed otherwise out of reach of more standard 
methods based on the second moment of the likelihood ratio. 

This connection with the detection-of-means problem also applies in the case where m > 
1, but with a twist. Indeed, when detecting correlations one does not average the vectors Xt 
but their covariances. So a simple reduction to the case m = 1 does not apply. However, one 
may still apply the representation result Lemma 1 to each observation vector Xf, yielding 
Vt's and Yt/s that are independent standard normal random variables. By conditioning on 
Vi, . . . , Vm, the problem becomes equivalent to detecting a subset of variables with means 
^/pVt, t = 1, . . . ,m. What makes the situation more complex is that the signs of the VJ's 
are random. Our approach to finding a general lower bound is based on this representation 
without which more standard methods seem to fail. The general lower bound, which is the 
key technical result of this paper, is given in Theorem 1 below. 

1.5 Contribution and content of the paper 

We obtain a general lower bound in Section 2 akin to, but not a straightforward extension 
of, the lower bound we obtained in Arias-Castro et al. (2012). We then study a number of 
tests that are near optimal in the sense that they come close to achieving the detection lower 
bound for various models. This is done in Section 3, where we also discuss computational 
issues, particularly in the clique model. We then specialize these general results in Sections 
4, 5 and 6, to the three models described in Section 1.1. In Section 7, we apply our general 
lower bound to the problem of studying the size of the clique number of a random geometric 
graph on a high-dimensional sphere. We close the paper with a discussion in Section 8 of 
possible extensions and challenges. 

2 Lower bounds 

In this section we derive a general lower bound for the minimax risk i?™^^. As mentioned 
in Section 1.2, the first step is to restrict the supremum in the definition of R^^^{f) to 
covariance matrices in which all the nonzero entries are equal to p > and then lower bound 
the maximum by an average. In particular, we have R^'^^ > R* where R* = inf j R{f) and 

R{f) = Po{/(^i, ...,xj = i} + ^Yl ^s{f{X,, . . . , X^) = 0} . 

' ' Sec 

Note that R* is just the Bayes risk for the uniform prior on the models S E C. It is well 
known that the test /* that achieves the infimum (i.e., R{f*) = R*) is the likelihood ratio 
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test that accepts the null if and only if the likelihood ratio (1/|C|) J2s&c 'i^siX) / (po{X) < 1. 
Here 0o is the standard normal density in M"*" and (ps is the normal density with covariance 
matrix defined by S. We focus on the case where p is bounded away from 1. 

Theorem 1 For any class C, any p E [0, 0.9) , and any a > ^/8, 

where Xm has chi-squared distribution with m degrees of freedom, Va '■= pa^/(l — p^), and 
Z is the size of the intersection of two elements of C drawn independently and uniformly at 
random. In particular, taking a = 



R*>1- ijEcosh™ ( -^Z 1 - 1 



2 A\j VI -P^ 

while i?* — 7- 1 there is a ^ oo such that Kcosh"^ (uaZ) — ?■ 1. 

To appreciate the relative simplicity of the arguments that follow, we encourage the 
reader to try the bread-and-butter second moment method, which amounts to bounding the 
variance of the likelihood ratio directly. We avoid this by representing the data in a different 
way. 

Proof. Under the alternative Hi, X E M™-^*^ can be written as 

X = j ^t'i if i ^ S,t e [m] . . 

\ Vpyt + Yt,, if zeS,te [m] ^^-'^ 

where (^t,i)je[n],teH' (^)teM are i.i.d. standard normal random variables. 

Let St = sign(Vt), which are i.i.d. Rademacher, and Ut = \Vt\. Denote by U the m- 
vector with components {Ui, . . . ,Um)- We consider now the alternative Hi{u), defined as 
the alternative Hi given U = u e W. Let L, /* (resp. Ru{f), Lu, /*) be the risk 

of a test /, the likelihood ratio, and the optimal (likelihood ratio) test, for Hq versus Hi 
(resp. Ho versus Hi{u)). For any u e M™, Ru{fl) < Ru{f*), by the optimality of /* for Hq 
vs. Hi{u). Therefore, conditioning on U, 

R* = R{f*) = EuRuif*) > EuRuifu) = 1 " l^u^o\Lu{X) - 1| . 

(E)7 is the expectation with respect to U.) Using the fact that E,q\Lu{X) — 1| < 2 for all u, 
we have (with B{0, a) being the euclidean ball of radius a in M"*) 

E[7Eo|Lc/(X) - 1| < 2P{||[/|| > av^} + P{||[/|| < av^} max Eo|L„(X) - 1| . 

ueB(0,av^) 

Therefore, using the Cauchy-Schwarz inequality, 

1 - IeuEo\Lu{X) - 1\ > F{\\U\\ < aV^} ( 1 - I max Eo\L^{X) - 1\] 

> P{||f/||<av/^}('l-i max V^oLUX) - l) . 

\ 2 ueB{0,ay/m) ) 
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We turn our attention to bounding EoL^(X) from above. Let Lu^e^si^) denote the likeli- 
hood ratio when S is anomalous, given u and e, which is equal to 



1 / ™ 



X 



2 

tA 



^Vtrt^ 2 2(1 -p) 

Since Lu{x) = K^'EsLu,e,s{x), by Fubini's theorem, we have 

EoL„(X)2 = Es,S'^e,e'^oLu,e,S {X)Lu,e',S'iX) 

where e,e' are i.i.d. Rademacher vectors and S, S' are i.i.d. uniform in the class C. We have 
Lu,e,s{^)Lu,e',s'{x) = {1 - p)-"'' exp{H^{x) + H2{x) + H;{x)) , 



where 



mx)-2^ 2^ 2(1 -p) 2(1 -p) ■ 

t=i iesns' \ h'l \ t^j 



- E E - 



2 2(1 -p) 

t=i ies\S' ^ 



2 2(1 -p) 

Let Z = \S ^ S'\. We see that H\{X),H2{X^,Hj,{X^ are independent of each other under 
the null with 

Eo exp(i/2(X)) = (1 - p)-l^\^'l/2 = (1 - p)-(^-^)/2 
and similarly for Eq exp(if3(X)), while 



(1 _ \ '"•^/2 ( 7 27 



For the latter, we used the fact that el = e'^ = \, to get 

\ 6xp I j exp(^ Xj j/zj 



2(1 -p) 2(1 -p) ; ^V2^ 

exp I /^M^ _ L _ ^(,, + ,;).,^ ^-^^^ 



l_p2 2(1 -p) V 1 + p^ ^ *^ V / 



1 + p V 1 - 

where the last line comes from a simple change of variables. Hence, 

(7 2 7 

1 - 1 - P 



Since {stB[ : t = 1, . . .m) are i.i.d. Rademacher, we have 



Define 



Maximizing 



1 — 



exp{—pbf) subject to ^^^t < m'ja'^ , (2.2) 
i=l t=i 

using Lagrangian multipliers and checking the Karush-Kuhn- Tucker conditions, we find that 
at a local maximum all the bt must be equal, so that the value of the optimization problem 
(2.2) is equal to 

I max_^ cosh(c) exp(— pc) 1 . (2.3) 

The function c h- )■ cosh(c) exp(— pc) is decreasing on (0,p) and increasing on (p, oo). There- 
fore the maximum is either at c = or c = 70^, and comparing the two, we have 

cosh(c) exp(— pc) = 1 <^ g{c) := -logcosh(c) = p . 

Straightforward calculations show that g is strictly increasing on (0, 00) with range (0,1). 
Therefore w{p) = g~^{p) is well-defined and, as a function of p, is infinitely differentiable 
and increasing. Hence, the maximum of (2.3) is at c = 70^ if w{p) < 70^. Again, elementary 
calculations show that 

f w „ wtanh^(w) 



w = — — ^ , w 



tanh(w) — p' (tanh(w) — p) 



2' 



implying in particular that w is convex. Numerically we find that w{0.9) < 7, so that (by 
convexity) w{p) < 8p for all p < 0.9, and w{p) < 70^ when a > y/S and Z > 1. 
Since we assume that a > y/S and p < 0.9, (2.3) is equal to 

cosh"' (70^) exp(— mp7a^). 

(This is also true when Z = 0.) Hence, 

Ee,e''^oLu,e,s{x)Lu',s',S'{X) 
m 

= (1 - p2)"™^/2 n ^°^h(6?) exp(-p6?) 



t=i 

, ^ , pa'^Z \ f mp^a^Z mZ 2^ 



< cosh™ ' ^"'^ 



l-p2 



where in the last inequahty we used the fact that a > 1 and s + log(l — s) > for all 

se(0,i). □ 

In Sections 4, 5, and 6, we specialize Theorem 1 to the different models we described 
in Section 1.1. Throughout, we assume that p is bounded away from 1. Specifically, we fix 
Po < 1 and consider p < po (in the proof above we chose po = 0.9, but the proof technique 
works for any po < !)• We also assume that k/n — )■ 0. When k ^ n, then i?* — 1 if 
priy/m — )■ 0. This is a straightforward application of our result using the bound Z < n. We 
leave the (easy) details to the reader. 



3 Tests 

In this section we introduce and briefly discuss two natural tests that will be seen to per- 
form near optimally in various regimes of the parameters. This optimality property will 
be established in Sections 4, 5, and 6, by comparing simple performance bounds with the 
implications of Theorem 1. 

The first test, that we call "squared-sum test", is based on a global test statistic that 
does not take the class C into account at all. 

The second test, a "localized" squared-sum test, is based on a simple scan statistic. It 
may also be interpreted as a simplified version of the generalized likelihood ratio test. 

As we will see, one of the two tests above always has a near-optimal performance in all 
three specific classes we discuss. Thus, the story is essentially complete for the point of view 
of detection performance. Unfortunately, the localized squared-sum test is hard to compute. 
We discuss two possible substitutes. The first one is a simple "maximum correlation test" 
that turns out to be nearly optimal for very small values of k. In Section 4.4 we discuss 
another test in the context of the clique model that is both near-optimal and computationally 
feasible when the sample size is at most logarithmic in the dimension n. 

All performance bounds derived below are in terms of the average correlation 

Pave=^^ E a,,>p, (3.1) 

where S is the anomalous set. 

3.1 The squared-sum test 

Consider the squared-sum test that rejects for large values of the test statistic 

^ = E • (3-2) 

t=l \i=l ) 

Such a test relies on the fact that all correlations are non-negative under the alternative 
hypothesis. The following result gives a simple characterization of the performance of the 
squared-sum test. Since the test does not use information about the class C, its minimax 
risk does not depend on the model either. 
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Proposition 1 The squared-sum test that rejects Hq when Y > nm+m^^^ky/np^ is asymp- 
totically powerful when Pa,vc\^k^/i^ oo. The squared-sum test with any threshold value 
for Y is asymptotically powerless when Pavcy/^ k'^ 0. 

Proof. Suppose that a := pavcV^ ^^/''^ ~^ Under the null Y ~ nXm: so that 

fo{Y > n{m + yjam)) — )■ 0, n — )■ oo. 
Under the alternative Y ^ {n + pavc k{k — l))x.m' so that 

Pi(F < (n + Pave k{k - l))(m - ^/aim)) -> 0, n ^ oo. 

Since 

(n + Pave k{k — l))(m — ^ am) — n{m + yj am) = n\/m{a + o(a)) — > oo, 

the test with critical value n{m + y/am) is asymptotically powerful. 

Suppose that a — )■ 0. We still have that Y/n ~ xln under Hq while Y/n (1 + pave k{k — 
l)/n)x^ under Hi. If m is fixed, Y/n is asymptotically under the alternative since 
Pavck{k — l)/n — 7- in that case. If m — )■ oo, (Y/n — m) / \/2m is asymptotically standard 
normal both under the null and the alternative. The latter is due to Slutsky's Theorem, 
since 

Y/n-m xl^-m pavc k{k - l)xl, X^-m 

'/= — ~ '/= 1 '/= ~ '/= r <-^PlCtJ) 

v2m v2m v2m v2m 

using the fact that Xm — Op{m). □ 



3.2 A localized squared-sum test 

When k is smaller, global tests such as the squared-sum test are not very powerful. The 
generalized likelihood ratio test "scans" over all subsets S in the class C. Instead of studying 
the generalized likelihood ratio test, we consider a localized version of the squared-sum test 
that has similar power and is a little easier to analyze. The localized squared-sum test rejects 
for large values of the test statistic 

m / 

^scan = max Ys, where = Xt,i 

S \ 

t=i \ies 

The following result gives sufficient conditions for the test to be asymptotically powerful. 
The conditions are in terms of the cardinality of the class C. Sharper bounds that take 
into account the fine metric structure of C are also possible by more careful bounding of the 
distribution of Vscan under the null. However, as we will see below, this bound is already quite 
sharp for the specific classes considered in this paper, and to preserve relative simplicity of 
the arguments we do not pursue sharper bounds here. 

Proposition 2 The localized squared-sum test that rejects the null hypothesis if 
^scan > ^ {pavek'^m + (if^^(31og \C\/m) - l) km) , 
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is asymptotically powerful when 

PaveA;>2i/-i(31og(|C|)/m), 
where H{h) := b — 1 — log b for b > 1. 

Proof. Under the null Ys ~ fcXm S E C, with 

P{xl>bm} <exp[-{m/2){b-l-\ogb)], V6 > 1, (3.3) 

by Chernoff 's bound. Hence, with the union bound, 

MYscn > hkm) < \C\ exp[-(m/2)(6 - 1 - log 6)]. 

By letting b = if~^(3 log |C|/m), the probability above tends to zero. 

Under the alternative where S is anomalous, ~ (A; + pavc k{k — l))Xm' ^o that 

Yscan >Ys>km + Pave k^m - O p{{k + Pave k'^)Vrn). 

Hence, the test is asymptotically powerful when pave k'^Tn > 2km {H~^{3 log |C|/m) — 1). □ 

Note that when 6 — 1, we have H{b) ~ (6 — 1)^/2 and therefore in the case when 
(log \C\)/m — )■ 0, the test is asymptotically powerful for Pave^ > (log \C\)/m for a constant 
A sufficiently large. On the other hand, H{b) ~ b when b — )■ oo, so in the case when 
(log|C|)/m — )■ oo, the sufficient condition for pave is that Pave^ > y4(log |C|)/m. Put it 
another way, a sufficient condition for the test to be asymptotically powerful is that 

pave/^ > A max (v^(log|C|)/m, (log \C\)/m) . 

When the class C is large (i.e., has size exponential in k), the test statistic may be difficult 
to compute as it involves solving a nontrivial combinatorial optimization problem. This is 
the case for the clique model (unless k is very small) and the matching model. In fact, we 
believe that the problem of computing, or even approximating, Kjcan is fundamentally hard, 
though we do not have a formal argument to prove it. An even stronger conjecture is that, for 
the clique model, computing any test with a near-optimal performance is fundamentally hard 
in some range of the parameters. We believe this is a challenging and important research 
problem. In Section 3.3 we suggest a test that has good performance and that is efficiently 
computable if m is only logarithmic in n. 



3.3 Maximum correlation test 

Finally, we mention the possibly simplest test that one would think of when confronted with 
testing Ho in the sparse regime. This is the test that rejects for large values of the maximum 
pairwise empirical correlation 



Ymax = max Xt^iXt. 



t=i 



In fact, this test does have some power in the sparse regime, and is actually near-optimal when 
k is fixed as the following result shows. However, one cannot expect a good performance of 
this test for large values of k. An advantage of this test is that it may be computed efficiently 
in a straighforward manner. 
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Corollary 1 The maximum correlation test tat rejects Hq when Vmax > -\/5m logn is asymp- 
totically powerful when 

Pave > A/5(logn)/m . 



Proof. Assume that m > 5 logn for otherwise the statement is void. For i ^ j fixed, under 
the null, Xt^iXtj,t = 1, . . . ,m are i.i.d. with zero mean, unit variance, and finite moment 
generating function in the a neighborhood of the origin. In fact, it is equal to (1 — A^)~^/^ 
for A G (—1, 1). Hence, by a standard result on moderate deviations (Dcmbo and Zeitouni, 
2010, Th 3.7.1), 



limsup ^logPo < ^Xt^iXtj >b^\ < 



for any sequence (6^) such that ^/m ^ hm ^ rn. We choose hm = \/^m logn and use the 
union bound, to get 

Po{Fmax > VSmlogn} < Pq j^^M^t.j' > ^ . 

Under the alternative when S C [n] is anomalous, pick i ^ j m. S such that Xt^iXtj,t = 
1, . . . ,m are i.i.d. with mean larger than pave and variance smaller than 2, so that by Cheby- 
shev's inequality, 

m 

Xt,iXtj = mp^^c + Op{y/m) . 



t=i 



From this, the result follows immediately. □ 

Note that the maximum correlation test is more powerful than the squared-sum test 
when 

— \/\ogn — )■ 0. 
n 



4 Clique model 

In this section we focus on the clique model. We derive corollaries of the Theorem 1 in 
various ranges of the parameters and compare them with the performance bounds for the 
squared-sum test and the scan statistics-based test considered in Section 3. 



4.1 Lower bounds for the clique model 

In order to apply Theorem 1, note that in the clique model, Z is hypergeometric with param- 
eters (n,, k, k), which is stochastically bounded by the binomial distribution with parameters 
{k,k/{n-k)). 

Recall that by assumption, there exists a po < 1 such that for all n, p < po. We distinguish 
various regimes of the parameters in which the minimax risk and the optimal test behave 
differently. 
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Case 1. Suppose 

k 1 1 / — 1 I either pm — , 

>0 and > oo and pym yO and < , 

n n n I or py/mk . 

(4.1) 

Let C = p^Jmk'^/n and choose a — ?■ oo such that a?C, — )■ 0. When Z < 'ik'^ /n, we use 
the fact that paP'Z — )■ and cosh(x) = 1 + + o(x^) when x — ?■ to get that for all 
sufficiently large n, 

co^\r {vaZ) = (1 + (i^a^)V2 + o(z/,Z)2)'" < exp(64Co'at') = 1 + o(l) 

where Cq = (1 — Po)^^- suffices to show that 

E[cOsh'"(z/,Z)l{2>8,2/„}] =0(1). 

If pm — )■ 0, we choose a such that a^pm 0. We use the bound cosh(x) < exp(x) and 
Bennett's inequality, to get 

^[cosh^{vaZ)l{z>sk^/n}\ < ^ e^^^ [ - z{l - C qo? pm)) 

< exp ( - rk^n) 0, 

eventually. 

If py/mk — )■ 0, we may assume that k < m for otherwise pm — )■ 0, which we already 
covered. We choose a such that a?p\fmk — (which implies in particular a?pk — )■ 0). 
We use the bounds cosh(x) < 1 + + o(x^) and Z < k, and the fact that a'^pk — )■ 0, 
to get 

cosh"'{uaZ) < exp{C^a'^p^mZ^) < exp{C^a^p^mkZ), 
eventually. Combined with Bennett's inequality, we get 

E [cosh'"(i/„Z){Z > SP/n}] < Yl exp ( - ^(1 - Co^aVmA;)) 

z>8k^/n 

< exp ( - 7k'^/n) -> 0, 

eventually. 
Case 2. Suppose 



k'^ , / mk , klogin/k"^) 

^0 and pJ- — ^^T^^O and — ^ ^ 0. 4.2 

n y log(n/A;"') m 

Let ( = p\Jmk/ login /k"^) and choose a oo such that aC, Q and a^pk — > 0. The 
latter is possible because (4.2) implies that pk — > 0. Then 

cos\r [vaZ) = (1 + {vaZ)''/2 + o(z/,Z)2)™ < exp [Cla^pmZ^), 
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eventually, where we used the bound Z < k and UaZ = 0{a?'pk) = o(l). We then bound 
< kZ and use the fact that Z is stochastically bounded by Bm{k, k/{n — k)), and 
knowing the moment generating function of the latter, we have 

Ecosh^fz/.Z) < f 1 + —^e^o'^*'"^''^^ 
\ n — k J 

< exp (exp ( - log(n/A;2)(l - C^a^C'))) = 1 + • 

Case 3. Suppose 

-->0 and p ^ 0. (4.3) 

Let ( = pm / log{n / k'^) and choose a — > oo such that a'^( — t- 0. We use the fact that 
cosh(x) < exp(x), and use the same bound on the moment generating function of Z, 
to get (eventually) 

Ecosh™(z/„Z) < Eexp{mUaZ) 



n — k 

< exp (exp ( - log(n//e2)(l - C^a^C'))) = 1 + o(l). 

This leads to the following. 

Corollary 2 In the clique model, under either (4.1), (4.2), or (4.3), i?* — j- 1. 

We will see below that (4.1) is tight up to logarithmic factors. This is also the case of 
(4.2) and (4.3), unless k'^/n as a negative power of n. Note also that the result is 
silent in the regime when k'^/n — > and py/mk'^/n 0. However, it is covered by (4.2) 
when \og{n/k'^)/k — )■ 0, and by (4.3) when \og{n / k"^) / ^/m — >■ 0, so again it is a matter 
of logarithmic factors. We mention that the typical exposition in the detection-of-means 
literature, for example in Donoho and Jin (2004), avoids the discussion of such fine details 
by assuming that A; = for some a G (0, 1). 



4.2 Localized squared-sum test 

Next we take a closer look at the performance of the localized squared-sum test for the 
clique model. In this case we have \C\ = (^) so log |C| ~ klog{n/k). Plugging this into 
Proposition 2, we see that the local squared-sum test is reliable when 

Pave > {2/k)H-\3klog{n/k)/m) . 

Based on this and Corollary 2, we conclude that the test is near-optimal in regimes (4.2) 
and (4.3), though only up to a logarithmic factor if k'^/n — slower than any power of n. 

However, we do not have such a guarantee in the regime (4.1). In this range of parameters, 
it is the squared-sum test that yields an optimal performance (up to a logarithmic factor) 
Also, comparing Proposition 1 and Proposition 2, we see that the local test dominates when 
max (l, {k/mf/'^) k^/'^/n tends to zero faster than l/\og{n/k). 
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4.3 The case of constant p 



Now we discuss the simple but interesting case when p is bounded away from zero. For 
simphcity, we may assume that p is a constant, independent of A^. 

From our previous work Arias-Castro et al. (2012) (and also from Theorem 1) in the case 
of m = 1, we know that when p < 1 is constant, i?* — )■ 1 unless /c^/n — > oo. Now we learn 
from Corollary 2 that i?* — )■ 1 also when k'^/n — )■ and m = o{\og{n / k"^)) . Hence, in the 
case of p constant and n/k^ a. positive power of n, a sample size m sub- logarithmic in the 
dimension n is not enough for reliable detection, and is qualitatively on par with the case of 
m = 1. 

However, the situation dramatically changes when the sample size becomes at least log- 
arithmic in the dimension n. Indeed, even for k = 2, both the localized squared-sum test 
and the maximum correlation test have a vanishing risk for any constant value of p when 
(log?T,)/m — )■ oo. This reveals a very interesting "phase transition" occuring when the sample 
size is about logarithmic in the dimension. 



4.4 Balancing detection and running time 

Given the often enormous size of data sets that statisticians need to handle as an every-day 
practice, it is of great interest to design computationally efficient, yet near-optimal tests. In 
the case of the clique model, this is a highly non-trivial task, because the class C has size 
exponential in k and therefore computing the localized squared-sum test (or other versions of 
the generalized likelihood ratio test and scan statistics) involves a non-trivial optimization 
problem over all elements of C. In fact, often it seems that small testing risk and 
computational efficiency are contradicting terms. In this section we show that in at least 
one non-trivial instance, it is possible to design a computationally efficient (i.e., computable 
in time quadratic in n) test that has near optimal risk. 

This is the case when the sample size m is (at most) logarithmic in n and k rf" for some 
a G (0, 1). (Recall from Section 4.3 that this is a quite interesting range of parameters.) 

To introduce a family of tests that balance detection performance and computational 
complexity, let £ G {1, ... , m} and define 



Yii) = max max > > 

^ ^ S:\S\=kT:\T\=e'^ ' 



' t&T i&S 

Since 

n 

teT i=n-k+l 

where < ■ ■ ■ < -^t,{n) are the ordered Xt/s, the statistic Y{i) can be computed in 

0{{"^){n\og{n)k + i\og{m))) time by first sorting (X^j : i = l,...,n) and summing the 
largest k, for all subsets T of size i, and then maximizing over these. 

For example, when m < logg n, then < 2"^ < n and the test may be computed in 
time O(n^logn). Even when m ~ Clogn for some constant C > 0, we may choose £ ~ 7m 
such that C7log(l/7) < 1. In that case (™) < 2^^°§2(W^) < ^ and again the test may be 
computed in time 0(n^ log n). The next proposition bounds the risk of the test. 
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Proposition 3 Then the test based on Y{i) with i < m/7 is asymptotically powerful in the 
clique model when 

f\ogin/k) \og{m/i) 

Pave ^ <J I 7 I } 



Proof. Since under the null J2teT^ies -^t^i ~ J\f{0,ik), by a standard bound for the 
maximum of a finite set of Gaussian variables, 



Y{i) < ^2ik\og (^^ Q ~ ^2ik{£hg{m/i) + k\og{n/k)) 
with high probability. Under the alternative where S is anomalous, we have 

yi^) > V^Th^k H h ^(m)) , 

where < ■ ■ ■ < Z(^rn) are the ordered values of 

Zt := {k + Pave k{k - J2 Xt,, , 

ies 

which are i.i.d. standard normal. Since we assume that m — > oo, we have that F{Z(^m-£+i) ^ 
1) ^ 1 when i/m < 1/7 < P {A/'(0, 1) > 1}. Hence, Y{i) > ./p^^ki with probability 
tending to one under the alternative. From this the result follows. □ 

In the regime of (4.3) with m ~ Clogra, we see that the test is optimal up to a constant 
factor in p when k n'^ for some a < 1/2. In this range of parameters, it seems hopeless to 
compute (or even approximate) the local squared-sum test. 

However, when m is much larger than logarithmic in n, this test also requires super- 
polynomial computational time and therefore it is not useful in practice. In such cases one 
may have to resort to sub-optimal tests such as the maximum correlation test described 
in Section 3.3. It is an important and difficult challenge to find out the possibilities and 
limitations of powerful detection taking computational constraints into account. 



5 Block model 

Next we discuss the consequences of our main results for the block model which serves as an 
easy and prototypical example of "small" or "parametric" classes. 

In this model, to apply Theorem 1, we may use the obvious bound Z < Z := kl{Sr\S' ^ 
0}. Noting that ¥{S n S' ^ ^) < 2k /n, we have 

2k 

Ecosir (l^aZ) < 1 + —COsir {Uak) . 

We distinguish between two main regimes. 
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• Case 1. Suppose 

, I m , k , los,( n/k) 

Pk^h, — r^^O and -^0 and ' ^ ^ . 5.1 

Y \og[nlk) n m 

Let C = pk\/ml \og{n/k) and choose a — )■ oo such that a^C ~^ and a^pk — )■ 0. The 
latter is possible because (5.1) imphes that pk — > 0. We use the bound cosh(x) = 
1 + + o(x2) to get 

cosh™(A;i^,) = (1 + ikuaf/2 + oikuaf)"" < exp(C2aVA;2m) , 

for n sufficiently large. Then 
9k 

— eMCyp^k^m) = 2exp(-log(n/A;)(l - CyC^)) ^ , 

Th 

by our assumptions. 

• Case 2. Suppose 

m k , , 

pk ^ — ^-JTT ^0 and ^ . 5.2 

log(n/A;) n 

Let ( = pkm / \og{n / k) and choose a — )■ oo such that a^C ~^ 0. We use the bound 
cosh(a;) < exp(a:) to get 

2k 

— cosh"^ ikua) < 2exp ( - log(n/A;)(l - Cqo^Q) ^ . 
n 

This leads to the following. 

Corollary 3 In the block model, under either (5.1) or (5.2), i?* — )■ 1. 

In view of Corollary 3, the squared-sum test is near-optimal for the block model only 
when /c X ra. However, the localized squared-sum test has a much better performance. We 
have \C\ = n, and plugging this into Proposition 2, we see that the localized squared-sum 
test is reliable when 

p,,,k>2H-\3i\ogn)/m) . 

With Corollary 2, we conclude that the test is near-optimal except in the case where /c/n — )■ 
slower than any negative power of n, where the test is optimal up to a logarithmic factor. 



6 Perfect matching model 

Here we work out the corollaries of our main results for the perfect matching model. This 
model illustrates how one may proceed when the model in question has a non-trivial com- 
binatorial structure. In order to use Theorem 1, one needs to use the specific properties of 
the class. 
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In the perfect matching model, Z is distributed as the number of fixed points in a random 
permutation over {!,...,/;;}. It is well known that 



• V.MO.....*}^ (6,1) 

We distinguish between two main regimes. To simplify notation, we assume that k is even. 

• Case 1. Suppose 

Pa/ k max(A;, m) — t- . (6-2) 

We choose a — )■ oo such that p^/kmmik, m) — )■ 0. We use the bounds cosh(x) = 
1 +a;^/2 + o(x^) and Z < k, and the fact that a^pk — 0, to get, for n sufficiently large, 

cosh"^(z/^Z) < exp{C^a^p^mZ^) < exp{Cyp^mkZ). 

Now let c = Cla^p^mk. Using (6.1), one obtains 

Ecosh"'(z/aZ) < Eexp(cZ) 



< 



''1/1 1 



^! Ve ik- z + l)\ 



exp(c2j 



A; + 1 

< exp(exp(c) - 1) + 0^12 exp(c/c) 

< 1 + 0(1), 

because c — )■ and log[(fc/2 + 1)!] ~ {k/2) \ogk as — > oo. 
Case 2. Suppose 

0. (6.3) 



log(min(/c, m)) 

We choose a ^ oo such that a^pm/ log(min(A;, m)) — 0. Using (6.1) one obtains 
¥.cos\r {uaZ) < Ecosh™(i^aZ)l|2<fc/2} + P{Z > k/2} cosh™(i^„A;) 

fc/2-l 



< E A f - + TTF^^ cosh^{vaz) + nZ > k/2} exp{u^mk) 
j^Q 2! \e [k/2)\J 

1 1 f k \ 

< - 5Z i! cosh™(i/,2) + + ^{Z > k/2}j exp{uamk) . 



z=0 

Now we take care separately of these last two terms. First note that 
k 3k 

ji^+nz>k/2]<^<eM(mnosk) 

when k is large enough, and since Uamk = 0{a?pmk) = o{k\ogk) by our choice of a, 
we obtain 

+ F{Z > k/2}^ expiu^mk) ^ . 
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For the other term the situation is shghtly more subtle. Let F be a sum of m inde- 
pendent Rademacher random variables. Using the binomial identity it is easy to prove 
that 

cosh'"(i^az) =Eexp(i^azF) , 

and thus we have 

1^°° 1 

- 5^ -r cosir (uaz) = E [exp(exp(i/„y) - 1] . 
^ z=o ^■ 

Now thanks to Hoeffding's inequality, we obtain for any t > 0, 

E [exp(exp(i^ay) — 1] < exp(exp(z/at) — 1) + exp(— t^) exp(exp(z/am) — 1) . 

In particular with t = m/logm, using that = 0{a'^pm) = o(logm) by our choice 
of a, this shows that 

E(exp(exp(z/„F) - 1) = 1 + o(l). 

This leads to the following. 

Corollary 4 Consider the class of perfect matchings on the complete bipartite graph. Under 
either of (6.2), or (6.3), R* ^l. 

It is easy to derive upper bounds for the performance of the localized squared-sum test 
in this model. All we need to observe is that \C\ = k\ and therefore log \C\ ~ klogk when 
/c — 7- oo. Plugging this into Proposition 2, we see that the local squared-sum test is reliable 
when 

Pave > i2/k)H~\3k{\ogk)/m) 

Thus ignoring logarithmic factors, the requirement is that PavcA/?Timin(/c, m) be large. Look- 
ing at Corollary 4, the complement of (6.2) or (6.3) corresponds to p-^fcmax(/c,m) oo 
and pm/ log min(fc, m) — )■ oo (essentially). Ignoring logarithmic factors, the requirement is 
that p\lm min(/c, m) be large. Thus the local squared-sum test is near-optimal. 

7 The clique number of random geometric graphs 

In this section we describe a, perhaps unexpected, application of Theorem 1. We use this 
theorem to derive a lower bound for the clique number of random geometric graphs on 
high- dimensional spheres. 

To describe the problem, let p G (0, 1) and let Zi, . . . , Z„ be independent random vectors, 
uniformly distributed on the unit sphere S^-x = {xGM'^: = 1}. A random geometric 
graph G{n,d,p) is defined by vertex set V = {!,..., n} and vertex i and vertex j are 
connected by an edge if and only if (Zj, Zj) > tp^d where the threshold value tp^a is such that 

F{{Zu Z2) > tp^d} = P 

(i.e., the probability that an edge is present equals p). The clique number u}{n,d,p) is the 
size of the largest clique of G{n,d,p) (i.e., the largest fully connected subset of vertices). 



20 



In Devroye et al. (2011) the behavior of the random variable u{n,d,p) is studied for fixed 
values of p when n is large and d = dn grows as a function of n. The rate of growth of 
u{n,d,p) is shown to depend in a crucial way of how fast dn increases with n. Specifically, 
the following results are established (and hold with probability converging to 1 as n — > oo): 

o{n) 

= n{n^-') for all e > 
O(log^n) 
(2 + o(l))logpn 

We see that the clique number behaves in drastically different ways between dn = o(logn) 
— when u{n, d,p) grows almost linearly — and dn ~ log^ n — when u{n, d,p) has a poly- 
logarithmic growth at most. 

The above-mentioned results leave open the question of where exactly the "phase tran- 
sition" occurs, and whether the upper bound in the regime dn ~ log^ n is sharp. In this 
section we are able to answer both of these questions. Below we establish a general lower 
bound for the clique number which implies that, perhaps surprisingly, the phase transition 
occurs around log^ n and that the upper bounds above cannot be improved in an essential 
way. We show that the median of the clique number u{n,d,p) is bounded from below by 
exp(K log^ n/(i) where k is a positive constant that depends on p only. This implies, for 
example, that if c? ~ clogn for some c > 0, then u{n,d,p) grows as a positive power of n. 
On the other hand, even when d ~ log^"*^ n for any fixed e > 0, then u!{n, d, p) is much larger 
than any power of logn. For the sake of simplicity, we only state the result for the case of 
p = 1/2. The argument is identical for other values of p. 

Theorem 2 There exist universal constants ci, C2, C3, C4 > such that for all n, d such that 
d> Ci log(c2n), the median of the clique number uj{n, d,l/2) satisfies 

A( ( ^1 low ^ ( ^^log^ (C2n)\ 
med(ci;(n, ct, 1/2)) > caexp I ^ I . 

One may take C\ = 7/16, c^ = 16 log2, C3 = 1/16, and C4 = 49/5120. 

Proof. The basic idea of the proof is to define a test that works well whenever the median 
clique number is small. But then the lower bound of Theorem 1 implies that the clique 
number cannot be small. 

Let uq = med{uj{n,d, 1/2)) for short. Consider the clique model with m = d, all nonzero 
correlations equal to p and k = 16ujo- For i = 1, . . . ,n, let X^^^ = {Xi^i, . . . ,Xi^d) G 
and define the random geometric graph G on the normalized vectors Zi = X*^*)/||X'^*)||, 
connecting points Zi and Zj whenever [Zi, Zj) > 0. The test statistic we consider is the 
clique number of the resulting graph, denoted by u. (This test was suggested and analyzed 
in Devroye et al. (2011). Here we combine their analysis with Theorem 1 to derive a lower 
bound for the median clique number.) 

Under the null hypothesis (when p = 0), the Zj's are i.i.d. uniform on the sphere Sd-i 
implying that G ~ G{n,d, 1/2) and, consequently, u ~ u{n,d, 1/2). Devroye et al. (2011) 



dn = 0(1) 

(i„ — J- 00 

dn = o(log n) 

dn>9 log^ n 

dn/ log^ n — 7- 00 



uj{n, d,p) = 
uj{n, d,p) = 
Ku{n, d,p) 
u{n, d,p) = 
uj{n, d,p) = 
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show that, under the alternative hypothesis, with probabihty at least 7/8, the graph contains 
a clique of size k whenever 

< (l/8)e'^'''/^°. (7.1) 



When this is the case, the test that accepts the null hypothesis when u < k has a probability 
of type II error bounded by 1/8. To bound the probability of type I error of this test, we 
first prove that Kqu < 2ujo for any d and n sufficiently large. We start with 

Eqo; > 2ujo Eqw — coq > -^qco =^ -^^oi^ < Eqo; — coq < a/ var(a;), 

where in the last step we used the well-known fact that the difference between the mean and 
the median of any random variable is bounded by its standard deviation. Now observe that 
u, as a function of the independent random variables Zi, . . . , Z„, is a configuration function 
in the sense of Talagrand (1995) which implies that var(a;) < Kqu (Boucheron et al., 2004, 
Corollary 2). We arrive at 

Equ > 2uJo ^Equ < a/EoW <^ Equj < 4. 

However, it is a simple matter to show that Eqcj > 4 for all d if n is sufficiently large. (To see 
this it suffices to show that the probability that 5 random points form a clique is bounded 
away from zero. This follows from the arguments of Devroye et al. (2011).) We then bound 
the probability of type I error as follows 

Fo{uj >k} = Fo{u > 16uo} < Fo{uj > SEou} < ^, 

8 

where we used Markov's inequality in the last line. 

Combining the bounds on the probabilities of type I and type II errors, we conclude that 
R* < 1/4. Put it another way. 



Now, by Theorem 1, we see that 

{IGcuof < 4(ln2)ne-^^'"^/^ ^ R* > 1/4. 

We conclude that, for any p G (0, 1), 

(IGwo)' < 4(ln2)ne-^6'"^/^ =^ (16a;o)' > (l/4)e'^^'/^°. 

Therefore, if p is such that 4(ln2)ne-^*^^'^/^ > (l/4)e'^'''/^°, then {IGuof > (l/4)e'^'''/^°. 
Choosing p = (7/ {16d)) log((16 log2)n) — which is possible since d > (7/16) log((16 log2)n) 
— clearly satisfies the required inequality and this choice gives rise to the announced lower 
bound. □ 
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8 Discussion 



The cornerstone of our analysis is the lower bound stated in Theorem 1. It is powerful 
enough that we can deduce useful bounds in many different models, which are seen to be 
optimal up to constant or logarithmic factors. While a considerable effort has been devoted 
in the related detection-of-means problem for finding the right constants, one wonders if it 
is possible to obtain results that fine here, at least in some regimes. One possible avenue 
for that is via the truncated second moment approach, which underlies the lower bounds 
in Butucea and Ingster (2011); Donoho and Jin (2004); Hall and Jin (2010); Ingster (1999). 
The computations are rather daunting in the setup of this paper and we decided not to 
take that route. Note that the second moment approach (without truncation) has limited 
applicability, though it is a little more useful here than it is in the case where m = 1. 

More generally, the problem of detecting correlations of arbitrary sign — not just positive 
correlations like we do here — remains open. Though one can design natural tests akin to 
our squared-sum and local squared-sum tests for that situation, the challenge is in deriving 
tight lower bounds. We mention that our approach to obtaining a lower bound in Section 2 
does not apply here, since the representation (2.1) is not valid when the correlations are 
negative. 
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